What are biplots?

  • The biplot is a powerful and very useful data visualisation tool.

  • Biplots make information in a table of data become transparent, revealing the main structures in the data in a methodical way, for example patterns of correlations between variables or similarities between the observations.

  • A biplot is a generalisation of a two-dimensional scatter diagram of data that exists in a higher dimensional space, where information on both samples and variables can be displayed graphically.

  • There are different types of biplots that are based on various multivariate data analysus techniques.

Flow of functions in biplotEZ

First step to create a biplot

biplot(data=iris, 
       group.aes = iris[,5],
       Title="My first biplot")
# Object of class biplot, based on 150 samples and 5 variables.
# 4 numeric variables.
# 1 categorical variable.
Argument Description
data a dataframe or matrix containing all variables the user wants to analyse.
classes a vector identifying class membership. Required for CVA biplots
group.aes Variable from the data to be used as a grouping variable.
center a logical value indicating whether data should be column centered, with default TRUE.
scaled a logical value indicating whether data should be standardised to unit column variances, with default FALSE.
Title Title of the biplot to be rendered.

Type of biplot: PCA

PCA()
Argument Description
bp Object of class biplot.
dim.biplot Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2.
e.vects Which eigenvectors (principal components) to extract, with default 1:dim.biplot.
group.aes If not specified in biplot()
show.class.means T or F: Indicating whether group means should be plotted in the biplot.
correlation.biplot T or F: Indicating whether distances or correlations between the variables are optimally approximated.

Construction of PCA biplot

  • Consider a data matrix \({\bf{X}}^*\) of size \(n \times p\).
  • Using the Iris data as an example, there are \(n=150\) observations measured across \(p=4\) variables.
tibble(iris)
# # A tibble: 150 × 5
#    Sepal.Length Sepal.Width Petal.Length
#           <dbl>       <dbl>        <dbl>
#  1          5.1         3.5          1.4
#  2          4.9         3            1.4
#  3          4.7         3.2          1.3
#  4          4.6         3.1          1.5
#  5          5           3.6          1.4
#  6          5.4         3.9          1.7
#  7          4.6         3.4          1.4
#  8          5           3.4          1.5
#  9          4.4         2.9          1.4
# 10          4.9         3.1          1.5
# # ℹ 140 more rows
# # ℹ 2 more variables: Petal.Width <dbl>,
# #   Species <fct>
  • To produce a biplot, we need to optimally approximate \({\bf{X}} = ({\bf{I}}_n - \frac{1}{n}{\bf{11}}'){\bf{X}}^*\).
  • We want to minimise \(min || {\hat{\bf{X}}} - {\bf{X}}||^2\).
  • The best approximation that minimises the least squares criterion is the \(r\)-dimensional Eckart-Young approximation given by \({\bf{\hat{X}}}_{[r]} = {\bf{U}} {\bf{D}}_{[r]} {\bf{V}}'\)

Representing samples

A standard result when \(r = 2\) is that the row vectors of \({\bf{\hat{X}}}_{[2]}\) are the orthogonal projects of the corresponding row vectors of \({\bf{X}}\) onto the column space of \({\bf{V}}_2\).

These projections are also known as the first two principal components.

Representing variables

The columns of \({\bf{X}}\) are approximated by the first two rows of \({\bf{V}}\), which now represent the axes for each variable.

The arrows representing the variables in the data can be calibrated to display marker points analogous to ordinary scatterplots.

PCA biplot

biplot(data=iris, 
       group.aes = iris[,5],
       Title="My first biplot") |> PCA() |> plot()

Aesthetics: samples()

Change the colour, plotting character and character expansion of the samples.

biplot(iris, group.aes = iris[,5]) |> 
  PCA() |> 
  samples(col = c("orange","purple","gold"), pch = c(15,1,17), cex = 1.2,opacity=0.6) |> 
  plot()

Notice that aesthetics in samples are applied to group.aes argument specified. Here there are three groups.

Aesthetics: samples()

Select certain groups, and add labels to the samples

biplot(iris, group.aes = iris[,5]) |> 
  PCA() |> 
  samples(which=c(1,2), col = c("orange","purple"),label=TRUE) |> 
  plot()

Aesthetics: samples()

Other arguments

Argument Description
label.col Colour of labels
label.cex Text expansion of the labels
label.side Side at which the label of the plotted point appears - “bottom” (default), “top”, “left”, “right”
label.offset Offset of the label from the plotted point
connected T or F: whether samples are connected
connect.col Colour of the connecting line
connect.lty Line type of the connecting line
connect.lwd Line width of the connecting line

Aesthetics: axes()

Change the colour and line width of the axes

biplot(iris[,1:4]) |> PCA() |> samples(col="grey",opacity=0.5) |>
  axes(col = "rosybrown",label.dir = "Orthog",lwd=2) |> plot()

Aesthetics: axes()

Show the first two axes with vector representation and unit circle

biplot(iris[,1:4]) |> PCA() |> samples(col="grey",opacity=0.5) |>
  axes(which=1:2,col = "rosybrown",vectors =TRUE,unit.circle = TRUE) |> plot()

Aesthetics: axes()

Other arguments

Axis labels
ax.names
label.dir
label.col
label.cex
label.line
label.offset
Ticks
ticks
tick.size
tick.label
tick.label.side
tick.label.col
Prediction
predict.col
predict.lwd
predict.lty
Orthogonal
orthogx
orthogy

Prediction of samples

prediction()

out <- biplot(iris[,1:4],group.aes=iris[,5]) |> PCA() |> 
  samples(col=c("orange","purple","gold"),opacity=0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102) )|>
  axes(predict.col = "red",predict.lwd = 1.5,predict.lty = 2) |> plot()

Prediction of samples

Prediction only on Sepal.Length: use the which argument.

biplot(iris[,1:4],group.aes=iris[,5]) |> PCA() |> 
  samples(col=c("orange","purple","gold"),opacity=0.5) |>
  prediction(predict.samples = c(1:2,51:52,101:102),which="Sepal.Length")|>
  axes(predict.col = "red",predict.lwd = 1.5,predict.lty = 2) |> plot()

Prediction of group means

biplot(iris[,1:4],group.aes=iris[,5]) |> PCA(show.class.means = TRUE) |> 
  samples(col=c("orange","purple","gold"),opacity=0.5) |>
  prediction(predict.means = TRUE)|>
  axes(predict.col = "red",predict.lwd = 1.5,predict.lty = 2) |> plot()

Predictions

summary(out)
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
# 
# Sample predictions
#     Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1       5.083039    3.517414     1.403214   0.2135317
# 2       4.746262    3.157500     1.463562   0.2402459
# 51      6.757521    3.449014     4.739884   1.6079559
# 52      6.389336    3.210952     4.501645   1.5094058
# 101     6.751606    2.836199     5.928106   2.1069758
# 102     5.977297    2.517932     5.070066   1.7497923

Interpolation of samples

biplot(iris[1:100,]) |> PCA() |> 
  interpolate (newdata =iris[101:150,]) |> 
  newsamples(col="red") |> plot()

Interpolation of axes

biplot(iris[,1:3]) |> PCA() |> 
    interpolate(newdata = NULL, newvariable = iris[,4]) |> 
    newaxes(X.new.names = "Petal.Width") |> plot()

Translation

Automatically or manually translate the axes away from the center of the plot.

biplot(iris)|> 
      PCA(group.aes=iris[,5]) |> 
      translate_axes(swop=TRUE,delta =0.2)|>
      plot(exp.factor=3)

Density plots

biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> 
  density2D(which=1,col=c("white","purple","cyan","blue")) |> plot()

Fit measures

out2 <- biplot(iris[,1:4],group.aes = iris[,5]) |> PCA() |> fit.measures()
summary(out2)
# Object of class biplot, based on 150 samples and 4 variables.
# 4 numeric variables.
# 
# Quality of fit in 2 dimension(s) = 97.8% 
# Adequacy of variables in 2 dimension(s):
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    0.5617091    0.5402798    0.7639426    0.1340685 
# Axis predictivity in 2 dimension(s):
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#    0.9579017    0.8400028    0.9980931    0.9365937 
# Sample predictivity in 2 dimension(s):
#         1         2         3         4         5         6         7         8 
# 0.9998927 0.9927400 0.9999141 0.9991226 0.9984312 0.9949770 0.9914313 0.9996346 
#         9        10        11        12        13        14        15        16 
# 0.9998677 0.9941340 0.9991205 0.9949153 0.9945491 0.9996034 0.9942676 0.9897890 
#        17        18        19        20        21        22        23        24 
# 0.9937752 0.9990534 0.9972926 0.9928624 0.9896250 0.9932656 0.9918132 0.9955885 
#        25        26        27        28        29        30        31        32 
# 0.9812917 0.9897303 0.9979903 0.9990514 0.9963870 0.9975607 0.9985741 0.9876345 
#        33        34        35        36        37        38        39        40 
# 0.9833383 0.9957412 0.9970200 0.9935405 0.9859750 0.9953399 0.9994047 0.9990244 
#        41        42        43        44        45        46        47        48 
# 0.9980903 0.9756895 0.9953372 0.9830035 0.9763861 0.9959863 0.9905695 0.9987006 
#        49        50        51        52        53        54        55        56 
# 0.9996383 0.9987482 0.9275369 0.9996655 0.9544488 0.9460515 0.9172857 0.9061058 
#        57        58        59        60        61        62        63        64 
# 0.9727694 0.9996996 0.8677939 0.8686502 0.9613130 0.9328852 0.4345132 0.9679973 
#        65        66        67        68        69        70        71        72 
# 0.7995848 0.9083037 0.7968614 0.5835260 0.7900027 0.8575646 0.8524748 0.6615410 
#        73        74        75        76        77        78        79        80 
# 0.9367709 0.8661203 0.8350955 0.8929908 0.8702600 0.9873164 0.9969031 0.6815512 
#        81        82        83        84        85        86        87        88 
# 0.8937189 0.8409681 0.7829405 0.9848354 0.6901625 0.8073582 0.9666041 0.6665514 
#        89        90        91        92        93        94        95        96 
# 0.6993846 0.9909923 0.9008345 0.9710941 0.8037223 0.9913632 0.9744493 0.7089660 
#        97        98        99       100       101       102       103       104 
# 0.9071738 0.9064541 0.9625371 0.9872279 0.9171603 0.9636413 0.9976224 0.9829885 
#       105       106       107       108       109       110       111       112 
# 0.9854704 0.9888092 0.8464463 0.9729353 0.9771293 0.9794313 0.9746239 0.9977302 
#       113       114       115       116       117       118       119       120 
# 0.9941859 0.9605563 0.8476794 0.9289985 0.9929982 0.9916850 0.9818957 0.9493751 
#       121       122       123       124       125       126       127       128 
# 0.9865358 0.8716778 0.9728177 0.9846364 0.9840890 0.9861783 0.9854516 0.9691512 
#       129       130       131       132       133       134       135       136 
# 0.9942007 0.9585884 0.9705389 0.9937852 0.9874192 0.9723192 0.9230503 0.9794405 
#       137       138       139       140       141       142       143       144 
# 0.8947527 0.9797055 0.9458421 0.9902488 0.9674660 0.9350646 0.9636413 0.9867931 
#       145       146       147       148       149       150 
# 0.9500265 0.9470544 0.9688318 0.9886543 0.8735433 0.9281727